-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add transformer base class #892
Conversation
Codecov Report
@@ Coverage Diff @@
## master #892 +/- ##
==========================================
- Coverage 66.58% 64.84% -1.74%
==========================================
Files 145 148 +3
Lines 8828 9140 +312
Branches 1605 1644 +39
==========================================
+ Hits 5878 5927 +49
- Misses 2633 2897 +264
+ Partials 317 316 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR is related to the basic operations of Vision Transformer.
Design a unified data-flow for the transformer.
Design a
BaseTransformerLayer
as base-class ofTransformerEncoderlayer
andTransformerDecoderlayer
in the vision transformer. It contains several basic operations such asattention
,layer norm
, andFFN
. It can be built fromConfigDict
and supports customization, for example, you can specify any number ofFFN or LN
and use different kinds ofattention
by specifying a list ofConfigDict
namedattn_cfgs
. It is worth mentioning that it supportsprenorm
.Design a
TransformerLayerSequence
as base-class oftranformerEncoder and tranformerDecoder)
in the vision transformer. Support customization such as specifying different kinds oftransformer_layer
inTransformerLayerSequence
Details
1 More unified data flow
Design a unified data flow for vision transformer and add
**kwargs
to adapt to other transformers (e.g., thelevel_start_index
,reference_points
in ...deformable-detr
) .2
BaseTransformerLayer
It uses the unified data flow and can be initialized by basic arguments of all
TransformerLayer
such asThe
forward
function can be used in bothTransformerEncodeLayer
andTransformerDecodeLayer
by specifying differentoperation_order
. It supportspre_norm
when you specifying the first operation asnorm
inoperation_order
.3
TransformerLayerSequence
It uses the unified data flow and can be initialized by basic arguments of all
TransformerLayerSequence
such asUsages
TransformerLayer
you can build the
TransformerEncoderLayer
by giving aConfigDict
which looks likeTransformerLayerSequence
, you can build it byBC-breaking
None